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CAN ONE ESTIMATE THE CONDITIONAL DISTRIBUTION OF 
POST-MODEL-SELECTION ESTIMATORS? 

By Hannes Leeb^ and Benedikt M. Potscher 

Yale University and University of Vienna 

We consider the problem of estimating the conditional distribu- 
tion of a post-model-selection estimator where the conditioning is on 
the selected model. The notion of a post-model-selection estimator 
here refers to the combined procedure resulting from first selecting 
a model (e.g., by a model selection criterion such as AIC or by a 
hypothesis testing procedure) and then estimating the parameters in 
the selected model (e.g., by least-squares or maximum likelihood), all 
based on the same data set. We show that it is impossible to esti- 
mate this distribution with reasonable accuracy even asymptotically. 
In particular, we show that no estimator for this distribution can be 
uniformly consistent (not even locally). This follows as a corollary 
to (local) minimax lower bounds on the performance of estimators 
for this distribution. Similar impossibility results are also obtained 
for the conditional distribution of linear functions (e.g., predictors) 
of the post-model-selection estimator. 

1. Introduction and overview. In many statistical applications a data- 
based model selection step precedes parameter estimation and inference. For 
example, the specification of the model (choice of functional form, choice of 
regressors, number of lags, etc.) is often based on the data. In contrast, the 
traditional theory of statistical inference is concerned with the properties of 
estimators and inference procedures under the central assumption of an a 
priori given model. That is, it is assumed that the model is known to the 
researcher prior to the statistical analysis, except for the value of the true 



Received October 2003; revised November 2005. 

^Supported by the Max Kade Foundation and by Austrian National Science Foundation 
(FWF) Grant P13868-MAT. A preliminary draft of the material in this paper was already 
written in 1999. 

AMS 2000 subject classifications. 62F10, 62F12, 62J05, 62J07, 62C05. 

Key words and phrases. Inference after model selection, post-model-selection estima- 
tor, pre-test estimator, selection of regressors, Akaike's information criterion AIC, thresh- 
olding, model uncertainty, consistency, uniform consistency, lower risk bound. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics^ 
2006, Vol. 34, No. 5, 2554-2591. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



H. LEEB AND B. M. POTSCHER 



parameter vector. As a consequence, the actual statistical properties of esti- 
mators or inference procedures following a data-driven model selection step 
are not described by the traditional theory which assumes an a priori given 
model; in fact, they may differ substantially from the properties predicted 
by this theory; see, for example, [3, 4], [18], Section 3.3, or [21], Section 12. 
Ignoring the additional uncertainty originating from the data-driven model 
selection step and (inappropriately) applying traditional theory can hence 
result in very misleading conclusions. 

Investigations into the distributional properties of post-model-selection 
estimators, that is, of estimators constructed after a data-driven model se- 
lection step, are relatively few and of recent vintage. Sen [23] obtained the 
unconditional large-sample limit distribution of a post-model-selection esti- 
mator in an i.i.d. maximum likelihood framework, when selection is between 
two competing nested models. In [18] the asymptotic properties of a class 
of post-model-selection estimators (based on a sequence of hypothesis tests) 
were studied in a rather general setting covering nonlinear models, depen- 
dent processes and more than two competing models. In that paper, the 
large-sample limit distribution of the post-model-selection estimator was de- 
rived, both unconditional as well as conditional on having chosen a correct 
model, not necessarily the minimal one. See also [20] for further discus- 
sion and a simulation study. The finite-sample distribution of a post-model- 
selection estimator, both unconditional and conditional on having chosen a 
particular (possibly incorrect) model, was derived in [12] in a normal linear 
regression framework; this paper also studied asymptotic approximations 
that are in a certain sense superior to the asymptotic distribution derived 
in [18]. The distributions of corresponding linear predictors constructed af- 
ter model selection were studied in [10, 11]. Related work can also be found 
in [1, 5, 7, 8, 9, 15, 19, 24]. 

It transpires from the papers mentioned above that the finite-sample dis- 
tributions (as well as the large-sample limit distributions) of post-model- 
selection estimators typically depend on unknown model parameters, of- 
ten in a complicated fashion. For inference purposes, for example, for the 
construction of confidence sets, estimators for these distributions would be 
desirable. Consistent estimators for these distributions can typically be con- 
structed quite easily, for example, by suitably replacing unknown param- 
eters in the large-sample limit distributions by consistent estimators; see 
Section 2.2.1. However, the merits of such "plug-in" estimators in small 
samples are questionable: It is known that the convergence of the finite- 
sample distributions to their large-sample limits is typically not uniform 
with respect to the underlying parameters (see [12, 15] and Remark 4.11 
in [14]), and there is no reason to believe that this nonuniformity will dis- 
appear when unknown parameters in the large-sample limit are replaced by 
estimators. This observation is the main motivation for the present paper to 
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investigate in general the performance of estimators for the distribution of a 
post-model-selection estimator, where the estimators for the distribution are 
not necessarily "plug-in" estimators based on the limiting distribution. In 
particular, we ask whether estimators for the distribution function of post- 
model-selection estimators exist that do not suffer from the nonuniformity 
phenomenon mentioned above. As we show in this paper, the answer in gen- 
eral is "No." We also show that these negative results extend to the problem 
of estimating the distribution of linear functions (e.g., linear predictors) of 
post-model-selection estimators. 

To fix ideas, consider for the moment the linear regression model 

(1) Y = Vx + W^p + u, 

where V and W, respectively, represent nx k and n x I nonstochastic regres- 
sor matrices {k>l,l> 1), and the n x 1 disturbance vector u is normally 
distributed with mean zero and variance-covariance matrix a'^In- We also 
assume for the moment that {V:Wy{V:W)/n converges to a nonsingular 
matrix as the sample size n goes to infinity and that lim„^oo VW/n ^ (for 
a discussion of the case where this limit is zero, see Example 1 in Section 
2.2.2). Now suppose that the vector x represents the parameters of inter- 
est, while the parameter vector ip and the associated regressors in W have 
been entered into the model only to avoid possible misspecification. Suppose 
further that the necessity to include ip or some of its components is then 
checked on the basis of the data, that is, a model selection procedure is used 
to determine which components of ip are to be retained in the model, the 
inclusion of x not being disputed. The selected model is then used to obtain 
the final (post-model-selection) estimator x for x- We are now interested 
in the conditional finite-sample distribution of x (appropriately scaled and 
centered) given the outcome of the model selection step. (The reasons why 
we concentrate on the conditional rather than on the unconditional distribu- 
tion are discussed below.) Denote this /c-dimensional cumulative distribution 
function (c.d.f.) by Gn,e,cr(t\M), where M stands for the selected model, that 
is, for the set of selected regressors. As indicated in the notation, this dis- 
tribution function depends on the true parameters 6 = (x'jV'')' -^o^ 
the sake of definiteness of discussion, assume for the moment that the model 
selection procedure used here is the particular "general-to-specific" proce- 
dure described at the beginning of Section 2; we comment on other model 
selection procedures, including Akaike's AIC and thresholding procedures, 
below. 

As mentioned above, it is not difficult to construct a consistent estimator 
for Gn,9,ai't\M) for any given t, that is, an estimator Gn{t\M) satisfying 

(2) Pn,eA\Gn{t\M) - Gn,eAt\M)\ X^) ™ 
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for each 6 > and each 9, a; see Section 2.2.1. However, it follows from 
the results in Section 2.2.2 that any estimator satisfying (2), that is, any 
consistent estimator for Gn,e,a{t\M), necessarily also satisfies 

(3) liminf sup P„,,,,(|G„(t|M) - G„,,,.,(t|M)| > 5) > c> 

\\9\\<R 

for suitable positive constants c, R and 6 (not depending on the estimator), 
with the lower bound c often being quite large. That is, while the prob- 
ability in (2) converges to zero for every given 6 by consistency, relation 

(3) shows that it does not do so uniformly in 9. It follows that Gn{t\M) 
can never be uniformly consistent (not even when restricting consideration 
to uniform consistency over all compact subsets of the parameter space). 
Hence, a large sample size does not guarantee a small estimation error with 
high probability when estimating the conditional distribution function of 
a post-model-selection estimator. In this sense, reliably assessing the pre- 
cision of post-model-selection estimators is an intrinsically hard problem. 
Apart from (3), we also provide minimax lower bounds for arbitrary (not 
necessarily consistent) estimators of the conditional distribution function 
Gn,9,a(t\^)- example, we provide results that imply that 

(4) liminf inf sup P„,e,<, (!(?„(* |M) - G„,e,a(t|M)| > 5) > 

'^^"^ G„(t\M) \\e\\<R 

holds for suitable positive constants R and 6, where the infimum extends 
over all estimators of Gn,e,(T{t\-^)- The results in Section 2.2.2 in fact show 
that the balls ||0|| < i? in (3) and (4) can be replaced by suitable balls (not 
necessarily centered at the origin) shrinking at the rate n"^/^. This shows 
that the nonuniformity phenomenon described in (3)-(4) is a local, rather 
than a global, phenomenon. Moreover, relations (3)"(4) also hold with the 
unconditional probability Pn,e,(7{') in (3)-(4) replaced by the conditional 
probability given model M is selected, that is, given the event M = M. In 
Section 2.2.2 we further show that the nonuniformity phenomenon expressed 
in (3) and (4) typically also arises when the parameter of interest is not %, 
but some other linear transformation oi 9 = {x'^i^'Y- As discussed in Re- 
mark 4.8, the results also hold for randomized estimators of the conditional 
distribution function Gn,d,a{t\M). Hence, no resampling procedure whatso- 
ever can alleviate the problem. This explains the anecdotal evidence in the 
literature that resampling methods are often unsuccessful in approximating 
distributional properties of post-model-selection estimators (e.g., [4] or [6]). 

The results outlined above are presented in Section 2.2.2 for the particular 
"general-to-specific" model selection procedure described at the beginning 
of Section 2. Analogous results for a large class of model selection proce- 
dures, including Akaike's AIC and thresholding procedures, are then given 
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in Section 3 based on the results in Section 2.2.2. In fact, it transpires from 
the proofs that the nonuniformity phenomenon expressed in (3)-(4) is not 
specific to the model selection procedures discussed in Sections 2.2 and 3 
of the present paper, but will occur for most (if not all) model selection 
procedures, including consistent ones; see Section 5. 

In the present paper we focus on the conditional distribution of the post- 
model-selection estimator. Given that the outcome of the model selector has 
been observed, it may be argued that the relevant sample space for assessing 
variability of the parameter estimator is then not given by the entire original 
sample space, but rather by the subset that gave rise to the observed out- 
come of the model selector; see the literature on conditional inference ([22] 
and [17], page 421). If one does not adhere to such a conditionality princi- 
ple the unconditional distribution of the post-model-selection estimator is 
of interest. For this case, similar results can be obtained and are reported 
in [13]. 

The plan of the paper is as follows: Post-model-selection estimators based 
on a "general-to-specific" model selection procedure are the subject of Sec- 
tion 2. After introducing the basic framework and some notation, such as 
the family of models Mp from which the "general-to-specific" model selec- 
tion procedure p selects as well as the post-model-selection estimator 6, 
the conditional c.d.f. Gn,e,a{t\p) of linear function of) the post-model- 
selection estimator 9 given that p selects model Mp is introduced and dis- 
cussed in Section 2.1. Consistent estimation of Gn,e,(7{t\p) of Cn.e.o-lijp) 
(i.e., of the c.d.f. conditional on the actual outcome of the model selection 
procedure) is discussed in Section 2.2.1. The main results of the paper are 
contained in Sections 2.2.2 and 3: In Section 2.2.2 we provide a detailed anal- 
ysis of the nonuniformity phenomenon encountered in (3)-(4). In Section 3 
the "impossibility" result from Section 2.2.2 is extended to a large class of 
model selection procedures, including Akaike's AIC, and to selection from 
a nonnested collection of models. Further theoretical results on which the 
proofs are based are given in Section 4 and conclusions are drawn in Section 
5. All proofs, as well as some auxiliary results, are collected into appendices. 
Finally, a word on notation: The Euclidean norm is denoted by || • ||, and 
Amax(-E') denotes the largest eigenvalue of a symmetric matrix E. A prime 
denotes transposition of a matrix. For vectors x and y, the relation x < y 
{x < y, resp.) denotes Xi < yi {xi < yi, resp.) for all i. As usual, ^ denotes 
the standard normal distribution function. 

2. Results for post-model-selection estimators based on a "general-to- 
specific" model selection procedure. Consider the linear regression model 



(5) 



Y = xe + u, 
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where X is a nonstochastic nx P matrix with rank(X) = P and u ~ iV(0, cr'^In), 

> 0. Here n denotes the sample size and we assume n > P > 1. In addi- 
tion, we assume that Q = hm^^oo^'^/^ exists and is nonsingular. In this 
section we shah — similarly as in [18] — consider model selection from the col- 
lection of nested models Mq ^ Mq^i C • • • C Mp, where O is specified by 
the user, and where, for < p < P, the model Mp is given by 

Mp = {{Oi, . . . , Op)' G : = . . . = 0p = 0}. 

(In Section 3 below general nonnested families of models will also be consid- 
ered.) Clearly, the model Mp corresponds to the situation where only the first 
p regressors in (5) are included. For the most parsimonious model under con- 
sideration, that is, for Mq, we assume that O satisfies < O < P; if O > 0, 
this model contains as free parameters only those components of the param- 
eter vector 6 that are not subject to model selection. [In the notation used in 
connection with (1) we then have x = (^i; • ■ • j ^o)' and ^p = {Oq+i, • • • , dp)' ■] 
Furthermore, note that Mq = {(0, . . . ,0)'} and that Mp = R^. We call Mp 
the regression model of order p. 

The following notation will prove useful. For matrices B and C of the same 
row-dimension, the column-wise concatenation of B and C is denoted by 
(P : C). If D is an m X P matrix, let D\p] denote the mx p matrix consisting 
of the first p columns of D. Similarly, let D[-'p] denote the mx{P —p) matrix 
consisting of the last P — p columns of P. If x is a P x 1 vector, we write in 
abuse of notation x\p] and x[-^p] for (3;'[p])' and (x'[-ip])', respectively. (We 
shall use the above notation also in the "boundary" cases p = and p = P. 
It will always be clear from the context how expressions containing symbols 
such as P[0], P[-iP], x[0] or x[-iP] are to be interpreted.) As usual, the ith 
component of a vector x is denoted by Xi, and the entry in the ith row and 
jth. column of a matrix B is denoted by Pij- 

The restricted least-squares estimator of 6 under the restriction 6[-^p] = 0, 
that is, under 6p+i = ■ ■ ■ = 6p = 0, will be denoted by 0{p), < p < P (in 
case p = P the restriction is void). Note that 9{p) is given by the P x 1 
vector 

5(._fiX[p]'X[p])-^X[p]'Y\ 

^^"^-y (o,...,o)' j' 

where the expressions 6^(0) and 6{P), respectively, are to be interpreted as 
the zero- vector in R^ and as the unrestricted least-squares estimator of 9. 
Given a parameter vector 6 in R^, the order of 9 (relative to the nested 
sequence of models Mp) is defined as 

^0(6*) = min{p : < p < P, 6* G Mp}. 

Hence, if 9 is the true parameter vector, a model Mp is a correct model if and 
only if p > Po{9). We stress that po(^) is a property of a single parameter, 
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and needs to be distinguished from the notion of the order of the model Mp 
introduced earUer, which is a property of the set of parameters Mp. 

A model selection procedure is now nothing else than a data-driven (mea- 
surable) rule p that selects a value from {O, . . . , P} and thus selects a model 
from the list of candidate models Mo , ■ ■ ■ , Mp . In this section we shall con- 
sider as an important leading case a "general-to-specific" model selection 
procedure based on a sequence of hypothesis tests. (Results for a larger 
class of model selection procedures, including Akaike's AIC, are provided in 
Section 3.) This procedure is given as follows: The sequence of hypotheses 
Hq-Po{0) < p is tested against the alternatives Hf:pQ{6) = p in decreasing 
order starting at p = P. If, for some p> O, Hq is the first hypothesis in the 

-V d I 1 

process that is rejected, we set p = p. If no rejection occurs until even Hq 
is not rejected, we set p = O. Each hypothesis in this sequence is tested by a 
kind of t-test where the error variance is always estimated from the overall 
model (but see the discussion following Theorem 3.1 in Section 3 below for 
other choices of estimators of the error variance) . More formally, we have 

(6) p = max{p : |Tp| > Cp, < p < P}, 

with 00 = in order to ensure a well-defined p in the range {0,0-1-1,..., P}. 
For O < p < P, the critical values Cp satisfy < Cp < oo and are independent 
of sample size (but see also Remark 4.7). The test-statistics are given by 

= ^ (0<p<F), 

with the convention that Tq = 0. Furthermore, 

'X[p]'X\p]\-'^ 



n 



p,p 



(0 < p < P) 



denotes the nonnegative square root of the ptli diagonal element of the 
matrix indicated, and o"^ is given by 

a^ = {n- Py^Y - X9{P))'{Y - Xe{P)). 

Note that, under the hypothesis Hq, the statistic Tp is t-distributed with 
n — P degrees of freedom for < p < P. It is also easy to see that the 
so-defined model selection procedure p is conservative: The probability of 
selecting an incorrect model, that is, the probability of the event {p < Po{0)}, 
converges to zero as the sample size increases. In contrast, the probability 
of selecting a correct (but possibly overparameterized) model, that is, the 
probability of the event {p = p} for p satisfying ma,x{pQ{6),0} <p<P, 
converges to a positive limit; see, for example. Proposition 5.4 and equation 
(5.7) in [11]. 
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The post-model-selection estimator 6 can now be defined as follows: On 
the event p = p, 9 is given by the restricted least-squares estimator 6{p), 
that is, 

p 

(7) 9=Y,eip)i{p = p), 

where l(-) denotes the indicator function of the event shown in the argument. 

2.1. The conditional finite-sample distribution of the post-model-selection 
estimator. We now introduce the distribution function of a linear trans- 
formation of 9, conditional on the event p = p, and summarize some of its 
properties that will be needed in the subsequent development. To this end, 
let ^ be a nonstochastic k x P matrix of rank k, 1 < k < P. For O <p < P, 
we consider the conditional c.d.f. 

(8) Gn,eAt\p) = Pn,eAV^M0-9)<t\p = p) (teR*^). 

Here Pnfi.ui') denotes the probability measure corresponding to a sample of 
size n from (5), and Pnfi,(j{'\p = p) denotes the associated conditional prob- 
ability measure (the conditioning event always having positive probability; 
cf. (3.8)-(3.9) in [11] and the attending discussion). Note that, on the event 
p = p, the expression A{9 — 9) equals A{9{p) — 9) in view of (7). 

Depending on the choice of the matrix A, several important scenarios are 
covered by (8): The conditional c.d.f. of ^/n{9 — 9) is obtained by setting A 
equal to the P x P identity matrix Ip. The conditional c.d.f. of the compo- 
nents of y/n[9 — 9) that are not restricted to zero in the selected model Mp, 
p > 0, is obtained by setting A to the p x P matrix {Ip : 0). In case O > 0, 
the conditional c.d.f. of those components of y/n{9 — 9) which correspond to 
the parameter of interest x ™- (1) can be studied by setting A to the O x P 
matrix [Iq : 0), as we then have A9 = {9i, . . . , 9^))' = x- Finally, if A 7^ is 
a 1 X P vector, we obtain the conditional distribution of a linear predictor 
based on the post-model-selection estimator. See the examples at the ends 
of Section 2.2.2 and Section 4.1 for more discussion. 

The c.d.f. Gn,e,a{t\p) and its properties have been analyzed in detail in 
[12] and [10]. To be able to access these results, we need some further nota- 
tion. The expected value of the restricted least-squares estimator 9{p) will 
be denoted by 'r]n{p) and is given by the P x 1 vector 

9[p] + {X\p]'X[p]r^X[p]'X[^p]9[^p]\ 

(o,...,o)' j' 

with the conventions that r/„(0) = (0,...,0)' G and that r}n{P) = 9. 
Furthermore, let ^n,p(') denote the c.d.f. of ^/nA{9{p) — rjnip))-, that is, 
the c.d.f. of ^/nA times the restricted least-squares estimator based on 
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model Mp centered at its mean. Hence, ^n,p{') is the c.d.f. of a A;-variate 
Gaussian random vector with mean zero and variance-covariance matrix 
cr^^[p](X[p]'X[p]/n)~^^[p]' in case p > 0, and it is the c.d.f. of point-mass 
at zero in R^' in case p = 0. If p > and if the matrix A\p] has full row rank 
/c, then $n,p(") has a density with respect to Lebesgue measure, and we shall 
denote this density by 4'n,pi')- We note that r/„,(p) depends on 9 and that 
$n,p(') depends on a (in case p > 0), although these dependencies are not 
shown explicitly in the notation. 
For p> 0, we introduce 

(10) bn,p = CiPyiA[p]iX\p]'X[p]/n)-'A[p]r 
and 

(11) cip=en,p-ciry{A[p]{x[p]'x[p]/n)-'A[p]rcir\ 

with Cn^ = A[p]{X\p\' X\p]/n)~^ Cp, where Cp denotes the pth standard basis 
vector in R^, and B~ denotes a generalized inverse of the matrix B. (Observe 
that p is invariant under the choice of the generalized inverse. The same is 
not necessarily true for bn^p, but is true for bn^pZ for all z in the column-space 
of A[p]. Also note that (13) below depends on bfi^p only through b^ pZ with 
z in the column-space of ^[p].) We observe that the vector of covariances 

between A9{p) and Op{p) is precisely given by a'^n~^C^^ (and hence does not 
depend on 0). Furthermore, observe that A9[p) and 9p{p) are uncorrelated 
if and only if (^^ p = p if and only if bn^pZ = for all z in the column-space 
of A\p\] see Lemma A. 2 in [10]. 

Finally, for a univariate Gaussian random variable 9^ with zero mean and 
variance s^, s > 0, we write As(a, b) for P{\^ — a\<b), a G R U {— oo, oo}, 
6 G R. Note that A<j(-, •) is symmetric around zero in its first argument, and 
that As(— oo,6) = A<i(oo,6) = holds. In case s = 0, is to be interpreted 
as being equal to zero, hence, a ^ Ao(a, b) reduces to the indicator function 
of the interval (—6, b). 

We are now in a position to present the explicit formulae for Gnfi,a{t\p) 
derived in [10]. In case p = O we have 

(12) Gn,e,am = ^nMt " V^A{7]n{0) - 9)), 

that is, the c.d.f. of (a linear function of) the post-model-selection estimator 
9 conditional on p = coincides with the c.d.f. of (this linear function of) 
the restricted least-squares estimator 9{0). However, in case p> O we have 

(13) Gn,e,a{t\p) = / rnn.p.e,a{z)^n,p{dz). 

Jz<t-^A{ri„(p)-e) 

In the above display, ^n,pidz) denotes integration with respect to the mea- 
sure induced by the normal c.d.f. $n,p(') on R'^ and the integrand mn,pfi,a{z) 
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is given by 

Jo 

P 

(14) X Yl l\^^^Ay/nr]n,q{q),sCqain,q)h{s)ds 

g=P+l 

where is the nonnegative root of Cn.p the model selection probability 
Pn,e,a{p = p) is given by 

'OO 

^ (1 - ^<T5„,p(^??n,p(p),SCp(TC„,p)) 
P 

n ^^^n,giVnVn,q{q),SCqa£,n,q)h{s)ds . 

g=P+l 

In the two displays above, h denotes the density of ct/cj, that is, h is the den- 
sity of (n — P)~^/^ times the square-root of a chi-square distributed random 
variable with n — P degrees of freedom. The conditional finite-sample dis- 
tribution of the post-model-selection estimator given in (13) is not normal; 
for example, it can be bimodal; see Figure 2 in [11]. An exception where 

(13) is normal is the case where Cn^ = 0, that is, when A9{p) and 9p{p) 
are uncorrelated; see [10], Section 3.3, for more discussion. On the other ex- 
treme, namely, if A9{p) and Op{p) are perfectly correlated in the sense that 
Cn,p = holds, the function A^-^^ ^ appearing in (14) reduces to an indicator 
function. This is, for example, the case \i A = Ip or if j4 = {Ip : 0). 

2.2. Estimators of the conditional finite- sample distribution. For the pur- 
pose of inference after model selection, the conditional finite-sample distribu- 
tion of the post-model-selection-estimator is an object of particular interest. 
As we have seen, it depends on unknown parameters in a complicated man- 
ner, and, hence, one will have to be satisfied with estimators of this c.d.f. 
The object we would primarily like to estimate is 

p 

Gn,eAAv) = XI GnfiAAv)'^{p = p), 

that is, the conditional c.d.f. after the model selection procedure has returned 
the model order p. As we shall see in Section 2.2.1, it is not difficult to 
construct consistent estimators for Gn,e,a{t\p)- We note that in considering 
consistency of an estimator of Gn^Q^f^{t\p) one is evaluating the performance 



'mn,p,9,a{z) = 



Pn,0,a{P = P) = 

(15) 
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of such an estimator in an unconditional manner, namely, over the entire 
sample space. One can also take a conditional view in such an evaluation 
and ask if the given estimator of Gn,e, aiAP) "consistent conditionally on 
the outcome p = p," at least for those parameter values 6 that lead to a 
positive limit of Pnfi^a{p = p), which are precisely all 9 G Mp as shown in 
Proposition A. 2 in Appendix A. Of course, this reduces then to the question 
of (conditional) consistency of estimators of Gn,e,ai't\p) ^Iso discussed in 
Section 2.2.1 below. 

Despite the consistency results in Section 2.2.1, we shall find in Section 
2.2.2 that any estimator of Gn.e,a{i\p) typically performs unsatisfactorily, 
in that the estimation error cannot become small uniformly over (subsets 
of) the parameter space even as sample size goes to infinity. In particular, 
no uniformly consistent estimators exist, not even locally. These results rest 
on parallel results for the estimation of Gn,9,o-(^|p) with fixed p which are 
collected in Section 4.1 below. 



2.2.1. Consistent estimators. We construct consistent estimators for 
Gn,e.a{t\p) and Gn^e,ait\p) (consistent over Mp in the latter case) by com- 
mencing from the asymptotic distribution. The large-sample limit of Gn.e,ait\p) 
for 9 G Mp is given by Goo,6»,o-(i|p) = ^oo,p{t) in case p = max{pQ{9),0}, and 
by 

z<t 

in case p > max{po(^)) C}- This follows from Proposition A.l in Appendix 
A with 7 = and cr^"^ = a. Here ^oo,p is the c.d.f. of a fc-variate Gaussian ran- 
dom vector with mean zero and variance-covariance matrix (T^^[p]Q[p :p]~^74[p]', 
< p < P, where Q[p-p] represents the leading diagonal p x p submatrix 
of Q. Also, let $oo,o(") denote the c.d.f. of point-mass at zero in R'^. Note 
that Goo,6»,o-(i|p)) for p> O, depends on 9 as it follows two different formulas 
depending on whether 9 E Mp\Mp_i or 9 G Mp_i. Let ^n,p{-) denote the 
c.d.f. of a A;-variate Gaussian random vector with mean zero and variance- 
covariance matrix cj^A[p](X[p]'X[p]/n)~-'^A[p]', <p < P; we also adopt the 
convention that $n,o(") denotes the c.d.f. of point-mass at zero in R^'. [We 
use the same convention for $n,p(") in case a = 0, which is a probability zero 
event.] For given p, O <p < P, an estimator Gn{t\p) for Gn,e,a{t\p) is now de- 
fined as follows: For p = O, we set Gn{t\0) = ^n,oit)- For p > O, we first em- 
ploy an auxiliary procedure that consistently decides between pq{9) = p and 
Po(^) < P) that is, between 9 G Mp\Mp-i and 9 € Mp_i, for every 9 € Mp. 
[E.g., the procedure that decides for po{9) =p whenever \Tp\ > Sn,p and for 
Po{9) < p otherwise, with Sn,p satisfying Sn,p oo, Sn,p = o(n^/^) for n ^ cxo 
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can be used. Alternatively, a consistent model selection procedure such as 
BIC could be employed to select between Mp^i and Mp\Mp^i.] If the proce- 
dure decides for po{9) =p, we set Gn{t\p) = ^>„^p(t); otherwise we set Gnit\p) 
equal to the expression in (16) with a, bn^p, Cn,p, (,n,p and $n,p(') replacing 
boo^p, Coo,p) Coc,p and $oo,p(')) respectively. A little reflection shows that 
Gn{t\p) is again a c.d.f. (This is trivial if (7 = 0, and follows for a > from 
the observation that then Gn{t\p) is either a normal c.d.f. or coincides with 
the conditional c.d.f. G* g^(t|p) given in (13) of [10] with a replaced by a.) 

This gives an estimator Gn{t\p) of Gn^e,a{'t\p)i as an estimator of Gnfi^aiAp)^ 
we shall use 

Gn{t\p) =J2p=o Gn{t\p)l{p = p). We have the following consis- 
tency results. 

Proposition 2.1. Let p satisfy O <p< P. Then the estimator Gnit\p) 
is consistent (in the total variation distance) for Gn,e,(r{t\p) ^'^^ Goo,6».o-(i|j') 
over the subset Mp (and over < a < oo). That is, for every 5 > 0, 

(17) Pn,eA\\Gn{-\p) - Gn,eA-\p)\\TY > S) ™ 0, 

(18) Pn,eA\\Gn{-\p) - Goo,eA-\p)\\TY ><5) ™ 

for all 9 G Mp and all cr > 0. The results (17) and (18) also hold with 
Pn,eA'\P = P) replacing Pn,eA')- 

Corollary 2.2. The estimator Gn{t\p) is consistent (in the total vari- 
ation distance) for Gn.e,aii\p) over the entire parameter space, that is, for 
every 6 > 0, 

Pn,eA\\Gnm - Gn,eA-\p)hy > ^) ™ 

for all 6 G R-^ and cr > 0. 

While the estimators constructed above are consistent, they can be ex- 
pected to perform poorly in finite samples when the true 9 belongs to 
Mp\Mp-i but is "close" to Mp_i, since the auxiliary decision procedure 
(although being consistent) will then have difficulties making the correct 
decision in finite samples, and since Gn,e,ai'\p) typically does not converge 
uniformly with respect to G Mp\Mp^i "close" to Mp_i (cf. [12, 15] and Re- 
mark 4.11 in [14]). In the next section we show that this poor performance is 
not particular to the estimators constructed above, but is a genuine feature 
of the estimation problem under consideration. 

2.2.2. Performance limits and an impossibility result. We now provide 
lower bounds for the performance of estimators of the conditional c.d.f. 
Gn,e,a{t\p) of the post-model-selection estimator A9; that is, we give lower 
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bounds on the worst-case probability that the estimation error exceeds a cer- 
tain threshold. These lower bounds are often quite large; furthermore, they 
remain lower bounds even if one restricts attention only to certain subsets of 
the parameter space that shrink at the rate n~^/^. In this sense the "impos- 
sibility" results are of a local nature. In particular, the lower bounds imply 
that no uniformly consistent estimator of the conditional c.d.f. Gnfi^aiAP) 
exists, not even locally. Similar results under a conditional evaluation of the 
estimation error are given in Section 4.1 and form the theoretical backbone 
for the results in the present section. We note already here that the lower 
bounds obtained in Section 4.1 are as large as 1 or 1/2, depending on the 
particular situation considered. 

In the following, the asymptotic "correlation" between A9{p) and Op{p) 
(p) ■ (p) 

as measured by Cdo = hm^^oo Cn will play an important role. Note that 
equals :p]~^ep, and hence, does not depend on the unknown pa- 

rameters or a. In the important special case discussed in the Introduction 

[cf. (1)], the matrix A equals the O x P matrix {lo :0), and the condition 

(p) 

Coo / reduces to the condition that the regressor corresponding to the 
pth column of (V : W) is asymptotically correlated with at least one of the 
regressors corresponding to the columns of V. See Example 1 below for more 
discussion. 

In the result to follow we shall consider performance limits for estimators 
of Gn,e,ait\p) at a fixed value of the argument t. An estimator of Gn,e,a{t\p) 
is now nothing else than a real- valued random variable r„ = Tn{Y,X). For 
mnemonic reasons, we shall, however, use the symbol Gn{t\p) instead of r„ 
to denote an arbitrary estimator of Gn,e,ai't\p)- This notation should not be 
taken as implying that the estimator is obtained by evaluating an estimated 
c.d.f. at the argument t, or that it is a priori constrained to lie between 
zero and one. We shall use this notational convention mutatis mutandis also 
in subsequent sections. Regarding the nonuniformity phenomenon, we then 
have a dichotomy which is described in the following two results. 

Theorem 2.3. Suppose that A6{q) and 9q{q) are asymptotically corre- 
lated, that is, 7^ 0, for some q satisfying O < q < P , and let q* denote 
the largest q with this property. Then the following holds for each 9 E Mg*_i, 
< cj < oo , and each t S R'^ .■ There exist (5o > and {) < pQ <oo such that 
any estimator Gn{t\p) of Gn,e,(7{i\P) satisfying 

(19) Pn,eA\Gnm - Gn^eAAp)\ ><5) ™ 

for each 5 > (in particular, every estimator that is consistent) also satisfies 

liminf sup Pn,^,a{\Gn{t\p) - Gn,^,a{t\p)\> 5q) 

\\-d-e\\<po/^ 
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(20) 

>2(l-cl>(c,0) n (2cl>(c,)-l)>0. 

q=q* + l 

The constants 5o and po may be chosen in such a way that they depend only 
on t, Q, A, a and the critical value Cq* . Moreover, 

(21) liminf inf sup PnA-(|Gn(t|p) - G,A-(t|p)| > 5o) > 

\\^-e\\<po/y^ 

and 

supliminf inf sup Pn,i),a{\^nit\p) - Gn.^,cT{t\p)\ > ^) 
ll'?-e||<po/v^ 

(22) 

p 

>(i-ci>(c,o) n mcq)-i)>o 

q=q* + l 

hold, where the infima in (21) and (22) extend over all estimators Gn{t\p) 
of Gn e cr(i|p)- [The lower hound in (20) is nothing else than lim^^oo Pne a{p = 

Proposition 2.4. Suppose that A9{q) and Oq{q) are asymptotically un- 
correlated, that is, G^ = 0, for all q satisfying O < q < P. Then 

(23) sup sup p„,,(|||.„p(.)-G„,e.,(-|p)||TV><5)™0 

holds for each 5 > 0, and for any constants a^, and a* satisfying < a* < 
a < oo. 

Inspection of the proof of Proposition 2.4 shows that (23) continues to 
hold if the estimator ^n.p(") is replaced by any of the estimators ^*n,p(") for 
O < p < P. We also note that in case = the assumption of Proposition 2.4 
is never satisfied (cf. Proposition 4.4 below), and hence, Theorem 2.3 always 
applies in that case. Furthermore, the case to which Proposition 2.4 applies 
is quite exceptional. In fact, under the assumptions of this proposition, the 
restricted estimators A6{q) for q>0 perform asymptotically as well as the 
unrestricted estimator A6{P). This is again a consequence of Proposition 4.4. 

We conclude this section by illustrating the above results with some im- 
portant examples. 
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Example 1 (The conditional distribution of x). Consider the model 
given in (1) with x representing the parameter of interest. Using the gen- 
eral notation of Section 2, this corresponds to the case AO = {6i, . . . , Oq)' = 
X with A representing the O x P matrix (/q :0). Here k = O > 0. The 
c.d.f. Gn,e,a{-\p) then represents the c.d.f. of \/n{x — x)? conditional on the 
event p = p- Assume first that lim„^oo 

V'W/n / 0. Then C7^^ / holds 
for some r > O. Consequently, the "impossibility" results for the estima- 
tion of Gn,e,a{t\p) given in Theorem 2.3 always apply. Next assume that 
lim.n~tooy'W/n = 0. Then = for every r > C In this case Propo- 
sition 2.4 applies and a uniformly consistent estimator of Gn,e,ait\p) in- 
deed exists. Summarizing, we note that any estimator of Gn,e,ait\p) suf- 
fers from the nonuniformity phenomenon, except in the special case where 
the columns of V and W are asymptotically orthogonal in the sense that 
lim.„^oo ^'1^/^ = 0. But this is precisely the situation where inclusion or 
exclusion of the regressors in W has no effect on the (conditional) distri- 
bution of the estimator x asymptotically; hence, it is not surprising that 
also the model selection procedure does not have an effect on the estimation 
of the c.d.f. of the post-model-selection estimator x- This observation may 
tempt one to enforce orthogonality between the columns of V and W by 
either replacing the columns of V by their residuals from the projection on 
the column space of W or vice versa. However, this is not helpful for the 
following reasons: In the first case one then in fact avoids model selection 
as all the restricted least-squares estimators for x under consideration (and 
hence, also the post-model selection estimator x) i^i the reparameterized 
model coincide with the unrestricted least-squares estimator. In the second 
case the coefficients of the columns of V in the reparameterized model no 
longer coincide with the parameter of interest x (and again are estimated 
by one and the same estimator regardless of inclusion/exclusion of columns 
of the transformed VF-matrix) . 

Example 2 (The conditional distribution of 6). For A equal to Ip, the 
c.d.f. Gn,e,(T{t\p) is the conditional c.d.f. of ^/n{9 — 6) given p = p. Here, 
A9{q) reduces to 0{q), and hence, A6{q) and 9q{q) are perfectly correlated 
for every q > O. Consequently, the "impossibility" result for estimation of 
Gn,9,cT(t\p) given in Theorem 2.3 applies. We therefore see that estimation of 
the conditional distribution of the post-model-selection estimator of the en- 
tire parameter vector is always plagued by the nonuniformity phenomenon. 

Example 3 (The conditional distribution of a linear predictor). Sup- 
pose A^OisalxP vector and one is interested in estimating the condi- 
tional c.d.f. Gn,0,ait\p) of the linear predictor A9. Then Theorem 2.3 and 
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the discussion following Proposition 2.4 show that the nonuniformity phe- 
nomenon always arises in this estimation problem in case = 0. In case 
O > 0, the nonuniformity problem is generically also present, except in the 
degenerate case where = 0, for all q satisfying O < q < P (in which case 
Proposition 4.4 shows that the least-squares predictors from all models Mp, 
O < p < P, perform asymptotically equally well). 

3. Extensions to other model selection procedures including AIC. In 

this section we show that the "impossibility" result obtained in the previous 
section for a "general-to-specific" model selection procedure carries over to 
a large class of model selection procedures, including Akaike's widely used 
AIC. Again, consider the linear regression model (5) with the same assump- 
tions on the regressors and the errors as in Section 2. Let {0, 1}^ denote the 
set of all 0-1 sequences of length P. For each r €{0, 1}^, let Mj denote the 
set {6 € : 0i(l — tj) = for I <i < P}, where tj represents the ith com- 
ponent of r. That is, describes a linear submodel with those parameters 
6i for which tj = restricted to zero. Now let be a user-supplied subset of 
{0, 1}^. We consider model selection procedures that select from the set 9^, 
or, equivalently, from the set of models {Mr : t £ 5^}- Note that there is now 
no assumption that the candidate models are nested (e.g., if $H = {0, 1}^, all 
possible submodels are candidates for selection). Also, cases where the inclu- 
sion of a subset of regressors is undisputed on a priori grounds are obviously 
covered by this framework upon suitable choice of fH. 

We shall assume throughout this section that contains tfuu = (1, . . . , 1) 
and also at least one element r,,, satisfying |r*| = P — 1, where |r,,| represents 
the number of nonzero coordinates of r*. Let r be an arbitrary model se- 
lection procedure, that is, f = i{Y,X) is a random variable taking its values 
in We furthermore assume throughout this section that the model selec- 
tion procedure r satisfies the following mild condition: For every r,, € 9^ with 
|r=i,| = P — 1, there exists a positive finite constant c (possibly depending on 
r=i,) such that, for every 6 G Mj^ which has exactly P — 1 nonzero coordinates, 

lim P„,e,<x({r = rfuii}A{|r,J >c}) 

n— >oo 

(24) 

= lim P„,e.,({r = r,}A{|rrJ <c}) = 

n — s-oo 

holds for every < o" < oo. Here ▲ denotes the symmetric difference operator 
and represents the usual t-statistic for testing the hypothesis 0i{r,) = 
in the full model, where i(r^,) denotes the index of the unique coordinate of 
r=K that equals zero. 

The above condition is quite natural for the following reason: For 9 E Mt,^ 
with exactly P—1 nonzero coordinates, every reasonable model selection 
procedure will — with probability approaching unity — decide only between 
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Mj^ and M^^^^^; it is then quite natural that this decision will be based 
(at least asymptotically) on the likelihood ratio between these two models, 
which in turn boils down to the t-statistic. As will be shown below, condition 

(24) holds in particular for AlC-like procedures. 

Let ^ be a nonstochastic k x P matrix of full row rank k, 1 < k < P, as 
in Section 2.1. For every r G 9^, we then consider the conditional c.d.f. 

(25) Kr,,e,am = Pn,eAV^MO-0)<t\x = x) (tGR^') 

of a linear transformation of the post-model-selection estimator 6 obtained 
from the model selection procedure r, that is, 

where the P x 1 vector ^(r) represents the restricted least-squares estimator 
obtained from model Mr, with the convention that d{x:) = G in case 
r = (0, . . . ,0). [In case Pn,9,aii = x) =0, we define Kn,e,a{'t\p) equal to, say, 
the c.d.f. of point-mass at zero in R'^'. This is done just for the sake of 
definiteness and has no effect on the results given below. For most model 
selection procedures, the probability Pn,e,a{x = x) will be positive for any 
r G anyway.] We also introduce 

(26) E:„,e,a(t|r) = ^ir„,,,,(t|r)l(r = r) (tGR*^). 

We then obtain the following result for estimation of Kn^gu{t\x) at a fixed 
value of the argument t which parallels the corresponding "impossibility" 
result in Section 2.2.2. 

Theorem 3.1. Let r^, G satisfy |t^,| = P — 1, and let i(r^,) denote the 
index of the unique coordinate ofx^ that equals zero; furthermore, let c be the 
constant in (24) corresponding to r*. Suppose that ^^(rfuii) and ^i(r,)(tfuii) 
are asymptotically correlated, that is, AQ^^e.^^^^ ^ 0, where e.^^^j denotes the 
i{x^,)th standard basis vector in R^. Then for every G M^, which has exactly 
P — 1 nonzero coordinates, for each < a < oo and for each t G R*^, the 
following holds: There exist 5o > and < po < such that any estimator 
Kn{t\x) of Kn,e,ait\i) Satisfying 

(27) Pn,eA\Knm - Kn,eAt\i)\ ><5) ™ 

for each 6 > (in particular, every estimator that is consistent) also satisfies 

liminf sup P„.ij,^(|K„(t|r) - K„,^,<^(t|r)| > (5o) 

\\^-e\\<po/V^ 

(28) 

> 2(1 - ^>(c)) > 0. 
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The constants 5q and po may be chosen in such a way that they depend only 
on t, Q, A, a and c. Moreover, 

(29) liminf inf sup - > 5o) > 

||-i9-e||<po/v^ 

and 

sup liminf inf sup Pn,i),a{\Kn{t\i) - Kn,^,ait\i)\ > S) 
M-9\\<po/V^ 

(30) 

> 1 - ^>(c) > 

hold, where the infima in (29) and (30) extend over all estimators Kn{t\i) 
of Knfi,ait\i)- [The lower bound in (28) is nothing other than lim„^oo Pn,e,aii = 

tfull)-] 

The basic condition (24) on the model selection procedure employed in the 
above results will certainly hold for any hypothesis testing procedure that 
(i) asymptotically selects only correct models, (ii) employs a likelihood ratio 
test (or an asymptotically equivalent test) for testing -/Vfj^^jj versus smaller 
models [at least versus the models Mr, with as in condition (24)], and (iii) 
uses a critical value for the likelihood ratio test that converges to a finite pos- 
itive constant. In particular, this applies to usual thresholding procedures, 
as well as to a variant of the "general-to-specific" procedure discussed in 
Section 2 where the error variance in the construction of the test statistic 
for hypothesis Hq is estimated from the fitted model Mp rather than from 
the overall model. We next verify condition (24) for AlC-like procedures. Let 
RSS{x) denote the residual sum of squares from the regression employing 
model Mr and set 

(31) /C(r) = log{RSS{x)) + |r|T„/n, 

where T„ > denotes a sequence of real numbers satisfying lim^^oo ■« = 
and T is a positive real number. Of course, IC{x) = AIC{x) if = 2. The 
model selection procedure r/c is then defined as a minimizer (more precisely, 
as a measurable selection from the set of minimizers) of IC{x) over 9^. It is 
well known that the probability that xic selects an incorrect model converges 
to zero. Hence, elementary calculations show that condition (24) is satisfied 
for c = Ti/2. 
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4. Further theoretical results. 

4.1. General-to-specific" model selection procedure. In this section we 
provide "impossibility" results for estimation of Gn^e^aiAp) given p which 
are parallel to the "impossibility" result for estimation of Gn,e,ai't\p) given in 
Section 2.2.2. The results presented below can be viewed as conditional coun- 
terparts to the results in that earlier section. Apart from being of interest 
on their own, the results given below also form the essential building blocks 
for the "impossibility" result in Section 2.2.2. In the next two theorems we 
shall consider estimation of Gn^^aiAp) ^ fixed value of the argument t. 

Theorem 4.1. Let p satisfy O <p< P. Suppose that A9{p) and 9p{p) 

(p) 

are asymptotically correlated, that is, Cob 7^ 0. Then the following holds for 
each £ Mp_i, < cj < oo, and for each t G R'^ with the property that the 
set {z G < t} has positive Lebesgue measure in IV: 

(a) There exist 6o > and < po < oo such that any estimator Gn{t\p) 
for Gn,e,a{t\p) satisfying 

(32) Pn,eA\Gn{t\p) - Gn,eAt\p)\ ><5) ™ 

for each 6 > (in particular, every estimator that is consistent over Mp) 
also satisfies 

(33) sup PnAA\Gn{t\p) - GnMi\p)\ > ^o) ™ 1- 

The constants 5o and po may be chosen in such a way that they depend only 
on t,Q,A,a and the critical value Cp. Moreover, 

(34) liminf inf sup Pn.^,a(|G„(t|p) - G„,^.^(t|p)| > Jq) > 

||i?-eil<P(j/v^ 

and 

(35) sup liminf inf sup Pn,'&,a{\Gnit\p) - Gn,'&,a(t\p)\ > > h 

||l?-0||<p(,/v^ 

hold, where the infima in (34) and (35) extend over all estimators Gn{t\p) 
of Gn,e,a{^\p)- 

(b) The above continues to hold with Pn^.^a{'\p = p) replacing Pn,-,a{')- 

The condition on the set {z G R^: ^[pjz < t} in the above theorem is 
easily seen to be equivalent to the condition that < t holds for some 
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z G R^. It is trivially always satisfied whenever t > 0. The condition on 
{z € : < t] is certainly satisfied for every t € R'^ if the matrix A[p\ 
has full row rank k. We shall repeatedly use the observation that the latter 

(p) 

rank condition is always met if p > is the maximal order for which Csb 7^ 
holds. [This follows from Proposition 4.4(a), (c).] 

As a point of interest, we note that the nonuniformity phenomenon de- 
scribed in Theorem 4.1 occurs within the model Mp, which contains only 
parameters for which the selected model is correct; that is, in (33)-(35) the 
suprema with respect to ?? extend only over subsets of Mp. That is, it is 
typically even impossible to construct an estimator of Gn,e,a(t\p) which per- 
forms satisfactorily for those local perturbations i? of the true parameter 
6 G Mp_i for which the selected model is correct. 

Consider next the case where Theorem 4.1 does not apply, that is, the 
model order p under consideration is such that either p = O, or p> O but 
C^P) = 0, or p > O and C^^ / but the set {zgW: A[p]z < t} has Lebesgue 
measure zero. In that case, it is indeed possible to construct an estimator 
of Gn^e^aiAp) that is uniformly consistent over 6 £ Mp. However, this result 
provides little consolation, because the uniform consistency over 9 £ Mp 
typically breaks down already in 1/ y^- "neighborhoods" of Mp, and results 
analogous to (33)-(35) can be established over such neighborhoods, even if 
Theorem 4.1 does not apply. This is of relevance as true parameter values 
in such l/-y/n- "neighborhoods" result in a positive probability of selecting 
the model Mp] see Proposition A. 2 in Appendix A. 

Theorem 4.2. Let p satisfy O <p < P. Suppose that A6{q) and 6q{q) 

are asymptotically correlated, that is, 7^ 0, for some q satisfying p < 
q < P, and let q* denote the largest q with this property. Then the following 
holds for each G Mp, < a < 00, and for each t G R'^.- 

(a) There exist 60 > and < po < 00 such that any estimator Gn{t\p) 
for Gnfi,a{t\p) satisfying 

(36) Pn,eA\Gn{t\p) - Gn,eAt\p)\ ><5) ™ 

for each 5 > (in particular, every estimator that is consistent over Mp) 
also satisfies 

(37) sup PnAA\Gn{t\p) - Gn,^Ai\p)\ > ^o) ™ 1- 

\\^-e\\<po/V^ 

The constants 5q and po may be chosen in such a way that they depend only 
on t,Q,A,a and the critical value Cp. Moreover, 

(38) liminf inf sup Pn,^A\Gn(.t\p) - Gn.-dA^M > ^o) > 

Gn{t\p) -DdM^, 

\W-e\\<po/yfn 
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and 

(39) supliminf inf sup Pn,-&,ai\Gnit\p) - Gn,-&,ait\p)\ > S) > ^ 
<5>0 G„(t\p) iieAf,. 

\\^-e\\<po/V^ 

hold, where the infima in (38) and (39) extend over all estimators Gn{t\p) 
of GnfiAAv)- 

(b) T/ie above continues to hold with Pn^.,a{-\p = p) replacing Pn,-,a{-)- 

We stress here once more that the probability of selecting the model 
order p is bounded away from zero uniformly over the t?-sets appearing 
in the suprema in (37)-(39); see Proposition A. 2 in Appendix A. Hence, 
the nonuniformity phenomenon we observe is not an artifact resulting from 
conditioning on an unlikely event. It is also worth noting that the lower 
bounds in the above results are as large as 1 and 1/2, respectively. 

Summarizing so far, we see that it is impossible to construct an estima- 
tor of Gnfi,a{t\p) which performs reasonably well in a neighborhood of the 
true parameter 9 {9 € Mp), whenever the model order p considered has the 
property that A9{q) and 9q{q) are asymptotically correlated for some q with 
max{p, O + 1} <q < P, as then either Theorem 4.1 or Theorem 4.2 applies. 
In particular, no uniformly consistent estimator exists, not even locally. In 
the remaining case, that is, when A9{q) and 9q{q) are asymptotically uncor- 
related for each q in the range max{p,0 -|- 1} < g < P, it is indeed possible 
to construct an estimator of Gn.e,a{i\p) which is uniformly consistent (even 
in the total variation distance) over l/-y/n- "neighborhoods" of Mp, as shown 
next. 

Proposition 4.3. Let p satisfy 0<p<P. Suppose that A9{q) and 
9q{q) are asymptotically uncorrelated, that is, =0 for each g = max{p, 
+ l},...,P. Then 

(40) sup sup Pn,9A\\Kp{-)-GnfiA-\v)\\TV>^r^^ 

l|fhp]||<p/v^'^*-'^-'^* 

holds for each 6 > 0, for each < p < oo, and for any constants o"* and 
a* satisfying < a* < a* < co. The result (40) also holds with Pn,e,r7{'\P ~ 
p) replacing Pnfi.ui )- [-^^ case p = P, the first supremum in (40) is to he 
interpreted as extending over all 9 G R^. Furthermore, the case p = is 
impossible in view of Proposition 4.4 below.] 

If the uncorrelatedness assumptions in the proposition even hold for all 
finite n, then the c.d.f. Gn^e^ai'lp) can be seen to reduce to the normal c.d.f. 
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^n,pi') and, hence, can be estimated uniformly consistently over the larger 
space Mp x [cr*, cr*]. 

Clearly, the case to which Proposition 4.3 applies is quite exceptional. 
In fact, under the assumptions of this proposition, the restricted estimators 
A6{q) for q > max{p — 1,0} perform asymptotically as well as the unre- 
stricted estimator A6{P). This is a consequence of the following result. 

Proposition 4.4. Let p satisfy < p < P. Then the following state- 
ments are equivalent: 

(a) A6{q) and 9g{q) are asymptotically uncorrelated, that is, = (0, . . . , 
0)' for each q = p + 1, . . . , P. 

(b) A6{p) is an asymptotically unbiased estimator of AO (6 G YiF ). 

(c) The asymptotic variance-covariance matrices of ^/nA6(p) and y/nA9{P) 
are identical. 

In case p = P, the above statements are always trivially satisfied. In case 
p = 0, these statements are never satisfied. 

It is easy to see that any of the above statements is equivalent to asymp- 
totic unbiasedness of A9[q) for all q = p, . . . ,P, and further, also is equiv- 
alent to all the asymptotic variance-covariance matrices of y/nA0{q) for 
q = p, . . . ,P being identical. Furthermore, a finite sample version of Propo- 
sition 4.4 can also easily be derived from the discussion following (19) in [10]. 
In fact, it is shown in that reference for any given sample size that uncorrelat- 
edness of A6{q) and 9q{q) for q = p + 1, . . . ,P is equivalent to the estimators 
A6{p) and A9{P) being identical, which, in turn, is equivalent to all the 
estimators A9{q) being identical for q = p, . . . , P. 

We conclude this section by illustrating the above results with some im- 
portant examples. 

Example 1 continued (The conditional distribution of x). Assume 
first that lim„_»oo V'W/n / is satisfied. Then, as already noted, c'^ ^ 
holds for some r > O. Consequently, for any such r, the "impossibility" re- 
sults in Theorem 4.1 apply with p = r (observe that rank(A[p]) = O = k 
always holds for p = r > O and, hence, the condition on t in that theorem 
is always satisfied). Furthermore, the "impossibility" results in Theorem 4.2 
apply for any p satisfying O <p <r for some r as above. Next assume that 
lim^^oo V'W/n = 0. Then &^ = for every r > C. In this case Proposi- 
tion 4.3 applies for every O < p < P, and an estimator of G'„^e^cr(t|p) that is 
uniformly consistent over 1/y^- "neighborhoods" of Mp indeed exists. 
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Example 2 continued (The conditional distribution of 0). As already 
noted, we have here A = Ip and A9{q) and Oq{q) are perfectly correlated 
for every q > O. Therefore, Theorem 4.1 applies for all t G R'^ ii p = P, 
and Theorem 4.2 applies in case p < P. (In the latter case, Theorem 4.1 
still applies for certain t G R'^.) Consequently, estimation of the conditional 
distribution Gn,9,a{t\p) of the entire parameter vector is always plagued by 
the nonuniformity phenomenon. 

Example 4 (The conditional distribution of the unrestricted components 
of 9). Let r > be a given model order. Conditional on the event p = r, the 
last P — r components of 9 are restricted to zero. If A is the r x P matrix 
{Ir :0), then the c.d.f. G„^6).o-(t|r) is the conditional c.d.f. of the first r (unre- 
stricted) components of \/n{^9 — 9) given the event p = r. In this case A9{r) 
and 9r{r) are perfectly correlated. If r > O, Theorem 4.1 immediately applies 
with p = r, because rank(A[r]) = r entails that the condition on t in that the- 
orem is always satisfied. In case O < r < P and lining c>o^['>']'X[~^f]/n ^ 0, 
Theorem 4.2 applies with p = r, since, under the latter condition on the re- 
gressors, 7^ holds for some q> r. As a consequence, the nonuniformity 
phenomenon is always present when estimating this conditional c.d.f., ex- 
cept in the very special case where r = O > and lim„_^oo ^['"]'^ /'T- = 
simultaneously hold; in this case Proposition 4.3 applies. 

4.2. Other model selection procedures including AIC. We use the nota- 
tion and assumptions of Section 3 here. In particular, the model selection 
procedure i is assumed to satisfy condition (24). The proof of Theorem 3.1 
relies on the subsequent result, which is of interest also in itself. Similarly 
as in the preceding sections, estimation of K^fi^aiA''^) at a fixed value of the 
argument t is considered. 

Theorem 4.5. Let r,,, G 9^ satisfy \x^: \ = P — 1, and let i(r*) denote the 
index of the unique coordinate ofXi, that equals zero; furthermore, let c be the 
constant in (24) corresponding to r*. Suppose that A9{x{uii) o,nd ^i(r,)(i^fuii) 
are asymptotically correlated, that is, AQ~^e.^^^.^ 7^ 0, where e.^^^^ denotes 
the i{x^)th standard basis vector in R^. Then for every 9 G M^^ which has 
exactly P —1 nonzero coordinates, for each < a < 00, and for each t G R'^, 
the following holds with r = r,,, as well as with r = tfuu-' 

(a) There exist 60 > and < po < 00 such that any estimator Kn{t\t) 
for Kn,e,a{t\x) satisfying 

(41) P„,,,,(|i^„(t|r) - Kr,^e,am\ > 5) ™ 
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for each 6 > (in particular, every estimator that is consistent over M^) 
also satisfies 

(42) sup PnM\Knm - Kn.^,am\ > ^o) ™ 1- 

\\^-0\\<Po/V^ 

The constants 5q and po may be chosen in such a way that they depend only 
on t,Q,A,a and also on c in case t = rfuii. Moreover, 

(43) liminf inf sup P„,^,^(|Er„(t|r) - K„,^,^(t|r)| > 5o) > 

\\'&-e\\<po/y^ 

and 

(44) sup liminf inf sup Pn,i),ai\Kn{t\t) - Kn,^^„{t\x)\ > 6) > ^ 

M-e\\<po/y^ 

hold, where the infima in (43) and (44) extend over all estimators Kn{t\t) 
of Kn,e,ait\^)- 

(b) The above continues to hold with Pn^.^a{'\i = ^) replacing Pn^.^ai')- 

We note that the conditional probability in Theorem 4.5(b) is eventually 
well defined; see (61)-(62) in Appendix E. 

4.3. Remarks and extensions. 

Remark 4.6. Although not emphasized in the notation, all results in 
the paper also hold if the elements of the design matrix X depend on sample 
size. Furthermore, all results are expressed solely in terms of the distributions 
Pn,0,ai') of Y, and hence, they also apply if the elements of Y depend on 
sample size, including the case where the random vectors Y are defined on 
different probability spaces for different sample sizes. 

Remark 4.7. The model selection procedure introduced in Section 2 is 
based on a sequence of tests which use critical values Cp that do not depend 
on sample size and satisfy < Cp < oo for C < p < P. If these critical values 
are allowed to depend on sample size such that they now satisfy Cn^p — > Coo,p 
as n — > oo with < Coa,p < oo for O < p < P, the results in [12], as well as 
in [10, 11], continue to hold; see Remark 6.2(i) in [12] and Remark 6.1(ii) 
in [10]. As a consequence, the results in the present paper can also be ex- 
tended to this case quite easily. 

Remark 4.8. The "impossibility" results given in Theorems 2.3, 3.1, 
4.1, 4.2 and 4.5 (as well as the variants thereof discussed in the subsequent 
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Remark 4.9) also hold for the class of all randomized estimators (with P^e a 
replacing Pn,e,a in those results, where P^e a denotes the distribution of 
the randomized sample). This follows immediately from Lemma 3.6 and the 
attending discussion in [16]. 

Remark 4.9. Results similar to the ones in Sections 2.2.2 and 4.1 can 
also be obtained for estimation of the asymptotic c.d.f. Goo,6»,cr(i|p)- Since 
these results are of limited interest, we omit them. In particular, note that 
an "impossibility" result for estimation of Goo^Q^(j{t\p) per se does not im- 
ply a corresponding "impossibility" result for estimation of Gn,e,a{t\p)i since 
Gn,e,u{t\p) does in general not converge uniformly to Goo,0,o-(*|p) over the 
relevant subsets in the parameter space; see Remark 4.11 in [14]. (Appropri- 
ate analogues apply to the model selection procedures considered in Sections 
3 and 4.2.) 

5. Conclusion. Despite the fact that we have shown that consistent es- 
timators for the conditional distribution of a post-model-selection estimator 
can be constructed with relative ease, we have also demonstrated that no 
estimator of this conditional distribution can have satisfactory performance 
(locally) uniformly in the parameter space, even asymptotically. In particu- 
lar, no (locally) uniformly consistent estimator of the conditional distribu- 
tion exists. Hence, the answer to the question posed in the title has to be 
negative. The results in the present paper also cover the case of linear func- 
tions (e.g., predictors) of the post-model-selection estimator. Corresponding 
results for the unconditional distribution of the post-model-selection esti- 
mator are presented in a companion paper [13]. 

The "impossibility" results are derived in the framework of a normal linear 
regression model (and a fortiori these results continue to hold in any model 
which includes the normal linear regression model as a special case). Fur- 
thermore, there is no reason to believe that the situation will get any better 
in more complex statistical models that allow, for example, for nonlinearity 
or dependent data. In fact, similar results can be obtained in general sta- 
tistical models, for example, as long as standard regularity conditions for 
maximum likelihood theory are satisfied. 

The results in the present paper are derived for a large class of conser- 
vative model selection procedures (i.e., procedures that select overparame- 
terized models with positive probability asymptotically), including Akaike's 
AIC and typical "general-to-specific" hypothesis testing procedures. For con- 
sistent model selection procedures — such as BIC or testing procedures with 
suitably diverging critical values Cp (cf . [2] ) — the (pointwise) asymptotic dis- 
tribution is always normal. (This is elementary; cf. Lemma 1 in [18].) How- 
ever, as discussed at length in [15], this asymptotic normality result paints 
a misleading picture of the finite sample distribution, which can be far from 
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normal, the convergence of the finite-sample distribution to the asymptotic 
normal distribution not being uniform. "Impossibility" results similar to the 
ones presented here can also be obtained for post-model-selection estimators 
based on consistent model selection procedures. These will be discussed in 
detail elsewhere. For a simple special case, such an "impossibility" result is 
given in Section 2.3 of [16]. 

The "impossibility" of estimating the distribution of the post-model- 
selection estimator does not per se preclude the possibility of conducting 
valid inference after model selection, a topic that deserves further study. 
However, it certainly makes this a more challenging task. 

APPENDIX A: THE LARGE-SAMPLE LIMIT OF GN,eAT\P) 
For p satisfying < p < P, partition the matrix Q = lim^— >oo X' X/n as 

/ Q[p:p\ Q[p:^p\ \ 

^ \Qbp--p\ Qbp--^P\)' 

where Q\p'-p\ is a p x p matrix. For p = 1, . . . , P, define 

(45) d,, = - C^Z^'{A[p\Q[p:pr'A[p]rC^Z\ 

h^,, = C^P}'{A[p]Q[p:pr^A[p^r, 

where c'^ = A\p\Q\p:p\~^ep, with Cp denoting the pth standard basis vector 
in R^*; furthermore, take C,oo,p and ^oo,p as the nonnegative square roots of 
p and ^^^p, respectively. As the notation suggests, $oo,p(i) is the large- 
sample limit of both defined in Section 2. Moreover, C,^\ ^t^p and 

C^ p are the limits of Cn \ Cn,p ^'^'^ Cn,p^ respectively, and bn,pZ converges 
to boo,pZ for each z in the column-space of ^[p]- See Lemma A. 2 in [10]. 

The next result is taken from Corollary 5.4 in [10] and describes the large- 
sample limit of the conditional c.d.f. under local alternatives to 6, under the 
assumption that the selected model Mp is a correct model for 6. Recall that 
the total variation distance between two c.d.f.s G and G* on R'^ is defined 
as ||G — G*||tv = sup^; \ G{E) — G*{E)\, where the supremum is taken over 
aU Borel sets E. Clearly, the relation \G{t) - G* {t)\ < \\G - G*\\Ty holds for 
all t S R'^. Thus, if G and G* are close with respect to the total variation 
distance, then G{t) is close to G*{t), uniformly in t. 

Proposition A.l. Let p satisfy O <p < P. Suppose 6 G R^ satisfies 
9 £ Mp, that is, po(^) < P holds. Moreover, let 7 G R^ and let a^""^ be a 
sequence of positive real numbers which converges to a (finite) limit a > 
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as n^oo. Then the conditional c.d.f. g_^^^^ ^{n){t\p) converges to a 
limit Goo^e,a,'yit\p) total variation, that is, 

(46) \\Gn,e+'y/y/li,a('-)i-\p) " Goo,e,a,^{-\p)\\TY 0. 

The large-sample limit c.d.f. Goo,e,a,'r{t\p) given as follows: In case p = 
max{po{e),0}, 

(47) Goo,6»,<T,7 

Here, 

piP) = A i^Qip--pr[Q^ip^--phbp] ^ (0 < p < p), 

with the convention that jS^^^ = -Aj if p = and that (3^^^ = G ifp = P- 
In case p > max{po(^)) C}, 

(48) Goo,e,aMt\p)= T 7— —-7 ^ ^oo,p{dz), 



where z/p = 7p + {Q[p-p]~^Q[p ■ 7 ["'?'] )p- [Note that P^p^ = lim„^oo x 
A{rjn{p) — 6 — ll \fn) because 6 G Mp, and that Vp = lim„^oo \/nr]n,p{p) in 
case 9 G Mp_i, that is, p > po{9). Here r}n{p) is defined as in (9), but with 
9 + 7/\/n replacing 9.] 

If p > and if the matrix A[p] has full row rank k, then the Lebesgue den- 
sity (j)oo,p{-) of <l>oo,p(') exists; the density of (47) is then given by (t)oo,p{t — 
[3^^^), while the density of (48) is given by the integrand in (48) times 
4'oo,p{z), evaluated at z = t — f3^P\ 

While the limiting c.d.f. in (47) is Gaussian, the limiting c.d.f. in (48) 

typically is not, an exception being the case where = 0, that is, when 
A9{p) and 9p{p) are asymptotically uncorrelated. In that case, the expres- 
sions in (47) and (48) coincide. Also note that the c.d.f. Goo,e,(7,'yit\p) has 
been defined above only for 9 G Mp (and O < p < P). If 7 = 0, we write 
Goo,6»,o-(i|p) as shorthand for G'oo,0,(T,o(i|p) in the following. 

Proposition A.l is restricted to sequences of parameters 9 + ^/y/n with 
Po(^) ^P- The case where the selected model Mp is an incorrect model for 
9, that is, where we have po{9) > p, is analyzed in [10], Proposition 5.1; see 
also the discussion following Corollary 5.4 in that reference. For the results 
in the present paper, however, we shall only need to rely on the situation 
covered by Proposition A.l. The reason essentially is that only over 1/^/n- 
"neighborhoods" of Mp is the probability of actually selecting the model Mp 
bounded away from zero. In contrast, for every fixed 9 ^ Mp, the probability 
of selecting the model M„ converges to zero as ?i— > 00. 
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Proposition A. 2. Let p satisfy O <p< P, and let Vn he a sequence of 
positive real numbers. 

(a) If rn = 0{l/y/n) asn^oo, then 

(49) liminf inf Pn&aip = p)>0 

n^oo ^gRP ' ' ' 

||l?hP]ll<^n 

holds for every a, < cr < oo. (The infimum in the above display is to be 
interpreted as extending over \\-d\\ < r„ ifp = and over all ofR^ ifp = P-) 
In particular, it follows that liminf^^oo inf^gR,pj|^_5i||<r„ -fn,i9,o-(p = p) > 
for each 9 G Mp and < a < oo. 

(b) Suppose p < P holds. If \fnrn — > oo as n — > oo, then 

(50) lim inf P„.^,,(j5 = p)=0 

|l-i?-6»||<r„ 

for each € Mp and < a < oo. 

(c) If an infimum (resp. supremum) over a £ [cr*,(T*], < a* < cr* < oo, is 
inserted in (49) [resp. (50)] immediately after the liminf (resp. lim) operator, 
the result continues to hold. 

Proof. Let be an arbitrary sequence of parameters in R^. Propo- 
sition 5.4 in Leeb [11] together with Remark 5.5 in that reference show that 
any accumulation point of P^ ^(„) ^{p=p) is of the form 

P 

(51) (1 - A^5^_^(z;p,CpCjCoo,p)) n ^^5oo.,K>CgCjCoo,g) 

q=p+l 

in case p> O, and of the form 

p 

q=p+l 

in case p = O. The quantities Vq, q = p, . . . , P, in these displays are accumu- 
lation points of 4"^ = Vn^^^^ + Vn{{X[q]'X[q])-^X[q]'X[^q]^^''^[^q])q in 
R U {— oo, oo}. (In case q = P this expression is to be interpreted as ^/n^^^p^ 
by our conventions.) Observe that the expression in (51) is positive if and 
only if \vq\ < oo holds for each q = p + 1, . . . , P. The same is true for (52). 
[In case p = P, the expression in (51) is always positive.] 
To prove part (a), it suffices to show that any accumulation point of 
aiP — P) is positive whenever i?'^") is a sequence satisfying [~^p]\\ < 
r„. In case p = P it is easy to see that (51) reduces to 1 — A^-^^ pi^P, cpaS,oo,p), 
which is bounded from below by the positive constant 1 — A^.^^ p (0, cpa^oo,p)- 
In case p < P note that ■y/ni?'-"^ [-ip] is a bounded sequence and, hence. 
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(n) 

Vq is bounded for each q = p + \, . . . ,P . It follows that \vq\ < oo holds 
for q = p + 1, . . . ,P. This completes the proof of part (a). 

To prove part (b), let be given by - 1] = 9[P - 1] and 'd^p'^ = 

rn/2. Clearly, then — 6*11 < r„ is satisfied. Moreover, ^^ri'd^p^ = y^r„/2 
converges to vp = oo. It follows that lim„^oo Pn a(P ~ P) ~^ whenever 
p<P. 

Part (c) is proved analogously. □ 

APPENDIX B: PROOFS FOR SECTION 2.2.1 

In the proofs below it will be convenient to show the dependence of 
$n,p(i) and $oo,p(i) on a in the notation. Thus, in the following we shall 
write ^n,p,ait) and ^oo,p,<T{t) for the c.d.f. of a mean zero /c-variate Gaussian 
random vector with variance-covariance matrix (T^^[p](X[p]'X[p]/n)~^A[p]' 
and o"^j4[p](5[p respectively. For convenience, let $n,o,o-(i) and 

$00,0, (T(i) denote c.d.f.s of a point-mass at zero in R'^. The following lemma 
is elementary to prove, if we observe that in case rank(A[p]) = k the conver- 
gence 6ri,p — > ^oo,p holds, since the generalized inverses in the definitions of 
these quantities then reduce to the usual inverse. 

Lemma B.l. Suppose p> O and that iwk{A\p\) = k. Define Sn,p{z, a) = 

'"i^A""^'mr.fT^ ^oo p{z, a) = ^--^-c^.p(''-^^;'J^-^) for z G R^ < 

a < oo. Let cr^") converge to a, < cr < cxd. Then Sn,p{z,a^'^^) converges to 
Scx,p{z,a) for every z £ R'^ if Coo,p / 0, and for every z G R'^ except possi- 
bly for z satisfying \boo^pz\ = CpCJ^oo,p if Coo,p = 0. (The exceptional set has 
Lebesgue measure zero since CpCr^oo.p > 0.) 

Lemma B.2. Let {^},A) and be measurable spaces and /et ^ : ^ H 

be a measurable function. Suppose ;U„ and fi are probability measures on 
{^},A) satisfying — /i||TV ~^ 0. Let pn be the probability measure induced 
by fin and that is, Pn{B) = for B ^B. Then pn converges to 

a probability measure p with respect to the total variation distance and p is 
the measure induced by p and ^. 

Lemma B.2 follows immediately from \\pn — p\\tv ^ WfJ-n — ^||tv- The 
following observation is useful in the proof of Proposition 2.1 below: Since 
the proposition depends on Y only through its distribution (cf. Remark 4.6), 
we may assume without loss of generality that the errors in (5) are given 
by ut = a£t, t S N, with i.i.d. £t that are standard normal. In particular, all 
random variables involved are then defined on the same probability space. 



Proof of Proposition 2.1. We consider first the case p > O and 
assume for the moment that the matrix A\p] has full row rank k. Then 



30 



H. LEEB AND B. M. POTSCHER 



*^*n,p,o-(') and ^oo,p,ct{-) possess densities (j)n,p,a{-) and (t>oo,p,a{-), respectively, 
with respect to Lebesgue measure on R'^. Since o" ^ a in P„_5)^o--probability, 
each subsequence contains a further subsequence along which it — > almost 
surely (with respect to the probability measure on the common probability 
space supporting all random variables involved), and we restrict ourselves 
to this further subsequence for the moment. In particular, we write {& — > a} 
for the event that a converges to a along the subsequence under consid- 
eration; clearly, the event {a — > a} has probability one. Also note that we 
can assume without loss of generality that a > holds on this event (at 
least from some data-dependent n onward), since a > holds. Lemma B.l 
now shows that on the event {cj — > a} the function Sn,p{z,a)(j)n,p,a{z) con- 
verges to Soo,p{z,cr)(j)oo^p^a{z) for every z except for a set of Lebesgue mea- 
sure zero. Observe that both functions are probability densities with re- 
spect to Lebesgue measure on R'^; see the discussion prior to Proposi- 
tion 2.1. In view of Scheffe's lemma, they hence converge in absolute mean. 
By the same argument, 4>n,p,a{') also converges to (/)oo,p,o-(') in absolute 
mean. Note that the absolute mean convergence of the densities trans- 
lates into convergence in total variation for the corresponding c.d.f.s. Now 
Gn{t\p) = ^n,p,a{'t) in case the auxiliary procedure decides for po{9) = p, 
and Gn(t\p) = J^^-^k ^^fSn,p{z,a)(j)n,p,a{z) dz otherwise. Since the auxiliary 
procedure decides consistently between po{6) =p and po{6) <p for every 
9 E Mp, it follows that (18) holds along the subsequence under considera- 
tion in case p> O and if A\p\ has rank k. Of course, this already proves (18) 
in case p> O and A[p] has rank k. 

In case p > O but where the matrix A[p] does not have full row rank 
k, let Gl^g^{t\p), G!^{t\p) and G^g^(t|p) be defined in exactly the same 

way as Gn,e,a{'t\p)^ Gn{t\p) and Goo^e,a{'t\p)^ respectively, except that the 
p X P matrix (Ip : 0) replaces A. Note that then Ip replaces A\p] (and that 
the value of k changes to p). Since the matrix Ip has full row rank p, the 
preceding paragraph shows that (18) holds with G(^{t\p) and G^^ g ai^lp) 
placing Gn{t\p) and Goo,9,ait\p), respectively. But G'„(t|p) and Goo,e,u{t\p), 
respectively, are the c.d.f.s of the image measures of Gl^{t\p) and q ai^\p) 
induced by the linear mapping x t-^ A\p]x, x € IV. [This is obvious for 
Gn{t\p) because of its interpretation as the conditional c.d.f. G* g^(t|p) in 
(13) of [10] if fj > 0; it is trivial if a = 0. Observe further that Gn,e,ai't\p) is 
clearly the c.d.f. of the induced measure obtained from the c.d.f. g ^(^Ip)- 
Since g ,^(t|p) Gi^ g^^{t\p) and Gn,e,a{^\p) Goo,e,a(*|p) with respect 
to total variation distance for £ Mp by Proposition A.l, an application 
of Lemma B.2 shows that Goo,6',(t(^|p) is indeed the c.d.f. of the induced 
measure obtained from the c.d.f. G^ g a(^\p)-] Therefore, the total variation 
distance of G„(t|p) and Goo,6»,o-(i|p) is bounded from above by that of G^(t|p) 
and G^ q ai^lp)- This proves (18) also in this case. 
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In the case p = O > 0, note that Goo,6»,o-(i|j') is given by (47) for 9 G Mp. 
The result in (18) then follows in a similar way, observing that in case 
A\p] has full row rank k (again after passing to appropriate subsequences), 
4'n,p,a{') converges to 4>oa,p,cTi') in absolute mean on the event {a — > a} as 
defined above. The case where p = O = is trivial, because both c.d.f.s in 
(18) coincide and are equal to the c.d.f. of point-mass at zero in R'^. This 
completes the proof of (18). 

The validity of (17) now follows for 9 G Mp since Goo,6»,o-(i|p) is then the 
limit of Gn,e,c7{t\p) with respect to the total variation distance; see Proposi- 
tion A.l. Finally, the claim regarding "conditional consistency" follows from 
(17) and (18) in view of Proposition A. 2(a). □ 

Proof of Corollary 2.2. Observe that 

Pn,eA\\Gnm - Gn,eA-\p)\hY > S) 
p 

= Pn,e,ai\\Gni-\p) - G„,0,<,(- |p) ||tV > S,p = p) 
< Pn,e,a{p = P) 

o<p<po{e) 

+ E Pn,eA\\Gni-\p)-Gn,eA-\p)\\Ty>S)- 

p>po(0) 

Each term in the first sum on the far right-hand side of the above display 
now obviously converges to zero (cf. [11], Corollary 5.6 and (5.7)), whereas 
every term in the second sum converges to zero by Proposition 2.1. □ 

APPENDIX C: PROOFS FOR SECTIONS 2.2.2 AND 4.1 

Since the results in Section 2.2.2 rely on those in Section 4.1, the latter 
ones are proved first. Some of the proofs rely on auxiliary results collected in 
Appendix D. We start with some preparatory remarks. The total variation 
distance between Pn,e.a and Pn,'&,a satisfies || -Pn.e,o- — -fn.i^.o-llTV < 2<I>(||6' — 
'9\\xUL{X'X)/2a) - 1; furthermore, if 6'(") and satisfy ||6'(") - = 
0(n~^/^), the sequence Pn,^(n) „ is contiguous with respect to the sequence 
P^gin) This follows exactly in the same way as Lemma A.l in [16]. We 
also need the following lemma. 

Lemma C.l. Let p satisfy O < p < P. Suppose 9 G Mp_i, < a < oo 
and < p < oo. Then 

liminf inf Pn,-&,aip = p) 
|ii?-e||<p/v^ 
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P 

= (1- Aa?oo,p(0,CpCTCoo,p)) n ^<x5oc,,(0,CgO-Coo,g) 

q=P+l 

(53) 

= 2{l-^c,)) n (2cl>(c,)-l) 

q=p+l 

= lim Pn,e.a{p = p) > 0. 

Proof. We proceed similarly as in the proof of Proposition A. 2, observ- 
ing that now the quantities Vg, q > p, are all equal to zero since "i?^"^ G Mp. 
Since (1 — A^g^ ^(i^p, Cpa^oo,p)) is minimal for Vp = 0, we see that the right- 
hand side of (53), which obviously is positive, is a lower bound for the 
left-hand side. Using (5.7) in [11] and observing that 9 G Mp_i completes 
the proof. □ 



Proof of Theorem 4.1. We first prove (33) and (34). For this pur- 
pose, we make use of Lemma 3.1 in [16] with a = 9 £ Mp_i, B = Mp, = 
{{}eMp: 111? - ^11 < pon^'/'}, P = ^, '/5n(/3) = Gn,i),Mp) and ^„ = Gn{t\p), 
where /?o, < /9o < oo, will be chosen shortly (and a is held fixed). The 
contiguity assumption of this lemma is satisfied in view of the preparatory 
remark above. It hence only remains to show that there exists a value of pQ, 
< po < oo, such that 5* in Lemma 3.1 of [16] [which represents the limit in- 
ferior of the oscillation of (/?«(■) over Bn] is positive. Applying Lemma 3.5(a) 
of [16] with Qn = PQ'n~^/'^ and the set Go equal to the set G, it remains, in 
light of Proposition A.l, to show that there exists a po) < po < oo, such that 
Goofi,a,'^{Ap) as a function of 7 is nonconstant on the set {7 G Mp : \\'^\\ < po}. 
In view of Lemma 3.1 of [16], the corresponding 5q can then be chosen as any 
positive number less than one-half of the oscillation of Goo,0,(t,7(^1p) over this 
set. That such a po indeed exists follows from Lemma D.l in Appendix D. 
Furthermore, observe that Goo^e,a,-iAp) is given by (48) for 9 G Mp_i and, 
hence, does not depend on 6, but only on t, Q, A, a and Cp. As a consequence, 
Po and do can be chosen such that they also depend only on these quantities. 
This completes the proof of (33) and (34). 

To prove (35), we use Corollary 3.4 in [16] with the same identification of 
notation as above, with ^„ = pon"-*^/^, and with V = Mp (viewed as a vector 
space isomorphic to R^). The asymptotic uniform equicontinuity condition 
in that corollary is then satisfied in view of llPn.e,o- — Pn,^,a\\TY ^ 2$(]]0 — 
'd\\xUL{X'X)/2a) - 1. Applying Corollary 3.4 in [16] then establishes (35). 
This completes the proof of part (a). 

Part (b) is proved exactly as part (a), making additional use of Corol- 
lary C.2 and Remark C.l in [16]. The events En appearing in this corollary 
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are given here by {p = p}- Clearly, Pn.^,<T{p = p) is always positive. The con- 
stant M in Corollary C.2 of [16] is now given by the right-hand side of (53) 
above. □ 

Proof of Theorem 4.2. We again use results from [16], this time with 
the identification a = 9 £ Mp, B = Mg* , Bn = {i} £ Mg. : -9\\ < pon^^/"^}, 
(3 = -d, ipniP) = Gn,'&,a(t\p), (fn = Gn{t\p), V = Mq, and Cn = P^tT^I'^ (again 
o is held fixed). The proof of part (a) is then similar to the proof of part 
(a) of Theorem 4.1, except for using Lemma D.2 instead of Lemma D.l 
and except for the fact that the argument that and (^o only depend on 
t,Q,A,a and Cp is now slightly more complex, since Goo,9,a,-{t\p) ^ ^ 
depends on 9. However, observe that Goo,6',(t,-(^|p) as a function of G Mp 
can follow only two different formulae which themselves do not depend on 
9] see (47) and (48). 

Part (b) is proved exactly as the corresponding part of Theorem 4.1, ex- 
cept that positivity of the constant M = liminf„^oo '^"^^di^Mq, ,\\-d-e\\<p/ y/nPn,^,cj 
p) follows now since M is bounded from below by the expression in part (a) 
of Proposition A. 2. □ 

Proof of Proposition 4.3. See [14]. □ 

Proof of Proposition 4.4. That part (a) implies part (b) follows 
from (20) in [10], observing that Cn'' — > and that rjn,q{q) converges 
to a finite limit. The reverse implication follows by passing to the limit in 
(20) of [10] and observing that, by suitable choice of G R^, the limit of 
{r]n^p+i{p + 1), . . . ,7]n,p{P)y can take on the value of every standard basis 
vector in R^"^. To prove the equivalence of parts (a) and (c), we use Proposi- 
tion 3.1 in [10] and equation (19) in that paper to obtain J2t=i '^'^^^rGcx^cj^^' 
as the formula for the asymptotic variance-covariance matrix of y/nA9{q). 
Since the terms in this sum are nonnegative definite, the equivalence fol- 
lows. The final claims regarding the cases p = P and p = are either obvious 
or follow immediately from the representation of the asymptotic variance- 
covariance matrix of y/nA9{q) just given. □ 

Proof of Theorem 2.3. In view of the definition of Gn,-&,a{t\p), we 
have 

p 

{Gnim - GnMi\P)\ = E \Gnm " G,,^,. (t |j>) 1 1 (p = p) 

>\Gnm-GnAAt\q*mP = ql- 
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Hence, for every {} G and every 5 > 0, 

Pn,^A\Gn{t\p)-Gn,^^^{t\p)\>6) 

observing that the conditional probabihties are weh defined since Pn,'&,a{p ■ 
q*) is always positive (cf. [11], Section 3.2). This implies 

liminf sup Pn,-a,a{\Gnit\p) - Gn,'&,ait\p)\ > S) 



\W-9\\<po/V^ 



(54) > 



liminf sup Pn,'&,a{\Gn{t\p) - Gn,i),a{t\q*)\ > 6\p = q*) 



||i9-e||<po/v^ 



X liminf inf Pn.^,a{p = Q*) 



\W-e\\<po/V^ 



Lemma C.l above shows that 



lim inf inf P„ ^^.^ {p = q*) 
||i?-e||<Po/v^ 



P 

= 2(l-c^(c,.)) n (2$(cg)-l)=Jim P„,e,,(p = g*), 

g=q*+l 

which obviously is positive. Suppose now that Gn{t\p) satisfies (19). Then 
it also satisfies Pn,e,a{\Gn{t\p) — Gn,e,a{t\q*)\ > S\p = q*) ^ Q, since the 
probability Pn,e,a{p = Q*) of the conditioning event is bounded away from 
zero as just shown. Since q* > O is the maximal model order q with the 
property that cj^^ / 0, the condition on t in Theorem 4.1 is satisfied for 
every t € R*^. Hence, we may apply Theorem 4.1(b) with p = q* to the 
first term in the product on the right-hand side of (54) since Gn{t\p) can 
certainly also be viewed as an estimator of Gn,e,a{t\Q*)- This establishes (20) 
with the same 5o and po as in Theorem 4.1(b). Furthermore, note that (54) 
remains valid if an infimum extending over all estimators is inserted between 
the limit inferior and the supremum on both sides of (54). Again applying 
Theorem 4.1(b) with p = q* completes the proof of (21)-(22). □ 



Proof of Proposition 2.4. See [14]. □ 
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APPENDIX D: AUXILIARY LEMMATA FOR APPENDIX C 

Lemma D.l. Let p satisfy O < p < P, and assume that A9{p) and 6p{p) 

(p) 

are asymptotically correlated, that is, Coo 7^ 0. Moreover, let 9 G Mp_i, let 
a satisfy < a < 00 and let t G R'^ he such that the set {z G R*' : A[p]z < t} 
has positive Lebesgue measure in R^ (which is satisfied for allt G R'^ if, e.g., 
ia.nk{A[p]) = k). Then Goo,9,a,'yi't\p) nonconstant as a function of 'j £ Mp. 

Lemma D.2. Let p satisfy O <p < P, assume that A9{q) and 0q{q) are 

asymptotically correlated, that is, C'^ / 0, for some q satisfying p < q< P, 
and let q* denote the largest q with this property. Moreover, let t G R'^, let 
9 G Mp and let a satisfy < o" < 00. Then Goo,6',cr,7(i|p) is nonconstant as a 
function of ^ £ Mg* . 

Before we prove the above lemmata, we provide a representation of Goce,^,^^ 
p) for p > that will be useful in the following. For < p < P, define 

Zp = X]r=i ^oo^rC'oo^VFr., where has been defined after (45) and the ran- 
dom variables Wr are independent and normally distributed with mean zero 
and variances cr^^^ ^. For convenience, let Zq denote the zero vector in R'^. 
Observe that Zp, p > 0, is normally distributed with mean zero and variance- 
covariance matrix o"^^[p](5[p :p]~^A[p]', since we have shown in the proof of 
Proposition 4.4 that the asymptotic variance-covariance matrix of \/nA9{p) 

can be expressed as Y^^=i'^'^ioorCoo C^' . Also, the joint distribution of Zp 
and the set of variables Wr, 1 < r < P, is normal, with the covariance vector 
between Zp and Wr given by a^ct^ in case r < p; otherwise Zp and Wr 
are independent. Define the constants Vr = Ir + {Q[r:r]~^Q[r:-'r]'y[-'r])r 
for < r < P. It is now easy to see that P^^^ defined in Proposition A.l 

equals —J2r=p+i^oo,rC^^^r- [This is seen as follows: It was noted in Propo- 
sition A.l that (3^^ = lim„,_»oo y/nA{rin{p) — 9 — j/^/n) for 9 G Mp, when 
rjn{p) is defined as in (9), but with 9 + 'y/^/n replacing 9. Using the rep- 
resentation (20) of [10] and taking limits, the result follows if we observe 
that y/nrjn,r{f) — > for 9 G Mp and r > p.] In view of (47), the c.d.f. 
Goo,e,a,"fit\p) can now equivalently be written as 

(55) G'oo,6',(T,7 

in case p = max{po(^);C} > 0, and (55) trivially holds in case p = 0. In 
case p > max{pQ{9),0} the c.d.f. Goo^e,arfiAp) is given by (48), and it is 
elementary but tedious to show, following the steps in Section 3.1 of [10], 
that this is equivalent to 

(56) Goo,e,<r,-y{t\p) = p(zp<t+ ^^^C^^^^r\\Wp + Up\>CpaU,p]- 

\ r=p+l / 
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(This can also be derived from the fact that the distribution of {Zp,Wp + 
i^p,..., Wp + up)' represents the hmiting distribution of ^/n{A{9{p) — r]n{p)y , 
dpip),. . . ,9p{P))' under with 6 ^ Mp^i [and r/„(p) defined as 

in (9), but with 6 + ^/^/n replacing 6].) 

Proof of Lemma D.l. Since 9 G Mp„i, Goo,6»,a,7(i|p) is given by (56). 
For 7 € Mp, the quantities t'p+i, . . . , p'p are easily seen to be zero, while Vp 
equals 7p. This leads to 

Goo,e,cj,^{t\p) = P{Zp <t\\Wp + 7p| > CpO-^oo.p) 
for 7 G Mp. Since Zp = Zp^i + S,^^pC'^Wp, we obtain 

(57) Goofi,oam = P{Zp-i+i^^pC^^^Wp<t\\Wp + -ip\>CpaU,p). 

Assume now that (57) is constant in 7p G R. Using Lemma D.3 below with 

Zp-i - t, Wp, -C^^pC^\ -7p and Cpa$,oo,p replacing Z, W, C, x and 6, 

respectively, we obtain that either P{Zp < t) = or that ^^^pC^^ = 0. By 
assumption of the lemma, the set {z £ : ^[pjz < i} has positive Lebesgue 
measure. Hence, P{Zp < t) must be positive. (To see why, note that Zp is 
concentrated in the column space of A[p], and that Zp is nondegenerate 

within the column-space of ^[p].) It would follow that C^,pC^^ = 0, contra- 
dicting the assumption that A6{p) and Op{p) are asymptotically correlated. 
□ 

Proof of Lemma D.2. By the assumptions on q* , note that either 

(r) 

q* = P or that Coo = for each r = q* + 1, . . . ,P. Consider first the case p = 
max{po{e),0}. By (55), we have Goo,e,a,-y{t\p) = P{Zp < t + Erlp+i x 

C^Vr)- Observe that (t'p+i, . . . , i/g*)' varies in all of R*^ ~^ when 7 varies 
in Mq*. Hence, the last mentioned probability goes to zero along an ap- 
propriate sequence of {i^p+i, . . . ,fg*)' (viz., a sequence along which at least 
one coordinate of i -|- J2t=p+i^^,rGcx} '-'r goes to —00). Since Zg* = Zp + 

J2t=p+i ^^,rGa:)Wr and since the Wr, r = p + 1, . . . , P, are independent of 
Zp, the c.d.f. G'oo,0,CT,7(i|p) can also be written as a (regular) conditional 
probability 

(58) Goo,e,a,'y{t\p) = P{Zg^ <t\Wp+i = -up+i, . . . ,Wg* = -ug,). 

Suppose now that Goo,e,a,'y(t\p) is constant in 7 G Mg*, or equivalently, is 
constant when (fp+i, . . . , fg*)' varies in all of R'' It follows from the 
above discussion that the conditional probability in (58) is then zero for all 
(fp+i, . . . ,Ug*y £ R*? By integration with respect to the distribution of 
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{Wp+i, . .. , Wq*), we obtain that P{Zq* < t) = 0. From Proposition 4.4(c), it 
follows that Zq* has a nonsingular normal distribution on R'^, which con- 
tradicts P{Zq* <t) = 0. This proves the lemma in case p = msix{pQ{6),0}. 
Consider next the case p> ma,x{pQ{6),0} and assume that Goo,e,a,'y{i\p) is 
constant in 7 G Mq*. Now Goo,e,a,'y{t\p) is given by (56). Letting 7^ 00, 
converges to 00 as well, and the expression in (56) converges to that in (55). 
Hence, (55) would have to be constant as a function of {I'p+i, . . . , f^*)' (note 
that {up+i, . . . jVq*)' depends only on 7[-'p] but not on 7^), which already 
has been shown to lead to a contradiction. □ 



Lemma D.3. Let Z be a random vector with values in R'^, let W be 
a univariate random variable independent of Z and assume that W has 
a Lebesgue density which is positive almost everywhere. Furthermore, let 
C G R^ and let 6>0. Then P{Z < CW\\W - x\ > 6) is constant in xeR 
if and only if P{Z < CW) = orC = 0. 

Proof. If C = 0, then P{Z < CW\\W - x\ > 6) equals P{Z < 0) , which 
is constant in x. If P{Z < CW) = 0, obviously also P{Z < CW\\W - x\ > 
6) = 0, and hence, is constant in x. Conversely, assume that P{Z < CW\\W — 
x\'>5) = P{Z < CW\\W — x'\ > 6) for each x, x' E R. Letting x' ^ 00 implies 
that 

P{Z<CWAW-x\>8) _ 

p{\w-x\->b) -nz<cw) 

holds for each x G R. This is equivalent to 

(59) P{Z <CW,W eB) = P{Z <CW)P{W eB), 

for all sets B of the form B = {x — 6,x + 6) with a; G R. Since both sides 
in (59) are sigma-additive set functions and since W is absolutely continuous 
with respect to Lebesgue measure, both set functions also agree on all sets 
of the form {—os,x + S\, and hence, on the entire Borel sigma-field on R. 
This implies independence of {Z < CW} and W. In particular, we have 

P{Z < CW) = P{Z < CW\W = w) 

for almost all tt; G R. Furthermore, by the assumed independence of Z and 
W, we have 

P{Z < CW) = P{Z < CW\W = w) = P{Z < Cw) 

for almost all if G R. Now if C 7^ 0, the right-hand side of the above display 
goes to zero either for u; — > 00 or for w — > —00, implying that P{Z < CW) = 
0. □ 
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APPENDIX E: PROOFS FOR SECTIONS 3 AND 4.2 



Proof of Theorem 4.5. After rearranging the elements of 9 (and 
hence, the regressors) and correspondingly rearranging the rows of the ma- 
trix A if necessary, we may assume without loss of generality that r,, = 

(!,...,!, 0), and hence, that i{t^) = P. That is, Mj, = Mp_i and M^j^jj = 

(p) 

Mp. Furthermore, note that after this arrangement Cso 7^ 0. Let p be the 
model selection procedure introduced in Section 2 with O = P — 1, cp = c 
and Co = 0. Let 9 be the corresponding post-model-selection estimator and 
let Gn,e,a{t\p) be as defined in Section 2.1. Condition (24) now implies the 
following: For every 9 £ Mp_i which has exactly P —1 nonzero coordinates. 



lim P„ e ^({r = rfuii}A{p = P}) 

n— >oo 

(60) 

= lim PnfiAii = '^*}^{P = P - 1}) = 

holds for every < a < 00. Since the sequences P„ ^ and Pn,e,a are con- 
tiguous for 'd^'^^ satisfying \\9 — 'd^"'^\\ = 0(n~^/^) as remarked at the begin- 
ning of Appendix C, it follows that condition (60) continues to hold with 
P„ ^{„) ^ replacing Pn,e,a- This implies that, for every sequence of positive 
real numbers s„ with s„ = 0(n~^/^), for every a, < a < 00, and for every 
9 G Afp_i which has exactly P — 1 nonzero coordinates, 

(61) liminf inf P„ ,9 o-(r = tfuii) = liminf inf P„^o-(p = P)>0 

\\^-~e\\<Sn ||i?-6l||<s„ 

and 

(62) liminf inf P„ ^ g- (t = ) = lim inf inf P„ ^ ^(p = -P — 1) > 0, 

hold, the positivity following from Proposition A. 2. A further consequence 
is that 

(63) sup ||-ft'n,i?,<7(-|tfull) - Gn,-d,ai-\P)\\TV ^ 



and 



(64) sup ||K„,^,.(-|r,)-G„,^,,(-|P-l)||Tv^O 

i5eR^ 



as n — > oo. From (63)-(64), we conclude that the limit of -f^n,6'+7/v^,(T('kfuii) 
(with respect to total variation distance) exists and coincides with Goo,6»,(7,7('|P)- 
Similarly, the limit of ir„ g_,_^/^^^(-|r*) is Goo^e,a,'y{'\P — !)• Because of (61) 
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and (62), we may assume that all relevant probabilities are positive (at least 
from a certain hq onward). Repeating the proof of Theorem 4.1 with p = P 
and where -f^n,i?,cr(i|tfuii) replaces Gn,^,o-(i|i-'), as well as repeating the proof 
of Theorem 4.2 with p = P — 1 = O, q* = P and where i^n,)9,fr(^|'^*) replaces 
Gn,^,a{t\P ~ gives the desired result. □ 

Proof of Theorem 3.1. Observe that (60)-(64) again hold after re- 
arranging coordinates as in the previous proof and that 

lim Pn,e,a{^ = ^fuii) = PnfiAP = ^) > 0) 
lim Pnfi,a{i = t*) = lim Pn,eAP = P-l)>0- 

Repeating the proof of Theorem 2.3 with q* = P, with Kn^-&^a{t\x) replacing 
Gn,^,a{t\p)^ ^'^d using Theorem 4.5(b) instead of Theorem 4.1(b) give the 
desired result. □ 
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